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Dedicated to Friedrich Gotze on the occasion of his sixtieth birthday 

Abstract. As was shown recently by the authors, the entropy power 
inequahty can be reversed for independent summands with sufBciently 
concave densities, when the distributions of the summands are put in 
a special position. In this note it is proved that reversibility is impos- 
sible over the whole class of convex probability distributions. Related 
phenomena for identically distributed summands are also discussed. 



1. The reversibility problem for the entropy power inequality 

Given a random vector X in M" with density /, introduce the entropy 
functional (or Shannon's entropy) 



h{X) = - / f{x)logf{x)dx, 
and the entropy power 

H{X) = e^'^^^)/", 

provided that the integral exists in the Lebesgue sense. For example, if X 
is uniformly distributed in a convex body A C M", we have 

h{X)=log\A\, = 

where |^| stands for the n-dimensional volume of A. 

The entropy power inequality due to Shannon and Stam indicates that 

H{X + Y)>H{X) + H{Y), (1.1) 

for any two independent random vectors X and Y in M", for which the 
entropy is defined ([Sha, Sta], cf. also [CC, DOT, SV]). This is one of the 
fundamental results in Information Theory, and it is of large interest to see 
how sharp (1.1) is. 
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The equality here is only achieved, when X and Y have normal distri- 
butions with proportional covariance matrices. Note that the right-hand 
side is unchanged when X and Y are replaced with affine volume-preserving 
transformation, that is, with random vectors 

X = Ti{X), Y = T2{Y) (IdetTil = IdetTal = 1). (1.2) 

On the other hand, the entropy power H{X + Y) essentially depends on the 
choice of Ti and T2. Hence, it is reasonable to consider a formally improved 
variant of (1.1), 

inf H{X + Y)> H{X) + H{Y), (1.3) 

-'l,-'2 

where the infimum is running over all affine maps Ti, T2 : M" — )• M" subject 
to (1.2). (Note that one of these maps may be taken to be the identity 
operator.) Now, equality in (1.3) is achieved, whenever X and Y have 
normal distributions with arbitrary positive definite covariance matrices. 

A natural question arises: When are both the sides of (1.3) of a similar 
order? For example, within a given class of probability distributions (of X 
and Y), one wonders whether or not it is possible to reverse (1.3) to get 

inf H{X + Y)< C{H{X) + H{Y)) (1.4) 

with some constant C. 

The question is highly non-trivial already for the class of uniform distri- 
butions on convex bodies, when it becomes to be equivalent (with a different 
constant) to the inverse Brunn-Minkowski inequality 

inf \A + B\^''^<C{\A\^''' + \B\^'A. (1.5) 

Ti ,T2 V / 

Here A + B = {x + y : x ^ A, yS B} stands for the Minkowski sum of the 
images A = Ti{A), B = T2{B) of arbitrary convex bodies A and B in R". 
To recover such an equivalence, one takes for X and Y independent random 
vectors uniformly distributed in A and B. Although the distribution of 
X + Y \s not uniform in A + B, there is a general entropy- volume relation 

i + < //(X + y) < + 

which may also be applied to the images A,B and X, Y (cf. [BM3]). 

The inverse Brunn-Minkowski inequality (1.5) is indeed true and repre- 
sents a deep result in Convex Geometry discovered by V. D. Milman in the 
mid 1980s (cf. [Ml, M2, M3, Pis]). It has connections with high dimensional 
phenomena, and we refer an interested reader to [BKM, KT, KM, AMO]. 
The questions concerning possible description of the maps Ti and T2 and re- 
lated isotropic properties of the normalized Gaussian measures are discussed 
in [Bob2]. 
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Based on (1.5), and involving Berwald's inequality in the form of C. Borell 
[Borl], the inverse entropy power inequality (1.4) has been estabhshed re- 
cently [BMl, BM3] for the class of all probability distributions having log- 
concave densities. Involving additionally a general submodularity property 
of entropy [Mad] , it turned out also possible to consider more general den- 
sities of the form 

f{x) = V{x)-^, X e M", (1.6) 

where V are positive convex functions on M" and /3 > ?i is a given parameter. 
More precisely, the following statement can be found in [BM3]. 

Theorem 1.1. Let X and Y be independent random vectors in with 
densities of the form (1.6) with /? > 2n + 1, /? > (Sqii (/3o > 2). There exist 
linear volume preserving maps Ti : M" — t- M" such that 

H{X + Y) < Cp, {H{X) + H{Y)), (1.7) 

where X = Ti{X), Y = T2{Y), and where Cp^^ is a constant, depending on 
/3o, only. 

The question of what maps Ti and T2 can be used in Theorem 1.1 is 
rather interesting, but certainly the maps that put the distributions of X 
and Y in M-position suffice (see [BM3] for terminology and discussion). In 
a more relaxed form, one needs to have in some sense "similar" positions for 
both distributions. For example, when considering identically distributed 
random vectors, there is no need to appeal in Theorem 1.1 to some (not 
very well understood) affine volume-preserving transformations, since the 
distributions of X and Y have the same M-ellipsoid. In other words, we 
have for X and Y drawn independently from the same distribution (under 
the same assumption on form of density as Theorem 1.1) that 

H[X + Y) < Cp, {H{X) + H{Y)) = 2Cp, H{X). (1.8) 

Since the distributions of X and —Y also have the same M-ellipsoid, it is 
also true that 

H{X -Y) < Cp, {H{X) + H{Y)) = 2C(s, H{X). (1.9) 

We strengthen this observation by providing a quantitative version with 
explicit constants below (under, however, a convexity condition on the con- 
volved measure). Moreover, one can give a short and relatively elementary 
proof of it without appealing to Theorem 1.1. 

Theorem 1.2. Let X and Y be independent identically distributed random 
vectors in M" with finite entropy. Suppose that X — Y has a probability 
density function of the form (1.6) with (3 > max{n + 1, Pqu} for some fixed 
/3o > 1. Then 

H{X-Y)<Dp,H{X) 
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and 

HiX + Y)<D}^H{X), 

where D/^,, = exp(-|^). 

Let us return to Theorem 1.1 and the class of distributions involved there. 
For growing /3, the families (1.6) shrink and converge in the limit as /3 — >■ 
+0O to the family of log-concave densities which correspond to the class 
of log-concave probability measures. Through inequalities of the Brunn- 
Minkowski-type, the latter class was introduced by A. Prekopa, while the 
general case (3 > n was studied by C. Borell [Bor2, Bor3], cf. also [BL, 
Bobl]. In [Bor2, Bor3] it was shown that probability measures on 
with densities (1.6) (and only they, once is absolutely continuous) satisfy 
the geometric inequality 

^(^tA+{i-t)B) > [tfi{Ar + {i-t)fi{BrY^'' (1.10) 

for all t S (0, 1) and for all Borel measurable sets A,Bc M", with negative 
power 

1 

Such /i's form the class of so-called K-concave measures. In this hierarchy 
the limit case f3 = n corresponds to k = — oo and describes the largest class 
of measures on M", called convex, in which case (1.10) turns into 

n{tA + (1 - t)B) > min{^(yl), fi{B)}. 

This inequality is often viewed as the weakest convexity hypothesis about a 
given measure fi. 

One may naturally wonder whether or not it is possible to relax the as- 
sumption on the range of /3 in (1.7)-(1.9), or even to remove any convexity 
hypotheses. In this note we show that this is impossible already for the class 
of all one-dimensional convex probability distributions. Note that in dimen- 
sion one there are only two admissible linear transformations, X = X and 
X = —X, so that one just wants to estimate H{X + Y) or H{X — Y) from 
above in terms of H{X). As a result, the following statement demonstrates 
that Theorem 1.1 and its particular cases (1.8)-(1.9) are false over the full 
class of convex measures. 

Theorem 1.3. For any constant C , there is a convex probability distribution 
fi on the real line with a finite entropy, such that 

mm{H{X + Y),H{X -Y)} > C H{X), 

where X and Y are independent random variables, distributed according to 
II. 
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A main reason for H{X + Y) and H(X — Y) to be much larger than H(X) 
is that the distributions of the sum X + Y and the difference X — Y may 
lose convexity properties, when the distribution fi of X is not "sufficiently 
convex". For example, in terms of the convexity parameter k (instead of /?), 
the hypothesis of Theorem 1.1 is equivalent to 

K>--^^-— (/3o>2), K> ^ 



(/3o-l)n " - n + V 

That is, for growing dimension n we require that k be sufficiently close 
to zero (or the distributions of X and Y should be close to the class of 
log-concave measures). These conditions ensure that the convolution of fi 
with the uniform distribution on a proper (specific) ellipsoid remains to be 
convex, and its convexity parameter can be controled in terms of Pq (a fact 
used in the proof of Theorem 1.1). However, even if k is close to zero, one 
cannot guarantee that X -\-Y or X — Y would have convex distributions. 

We prove Theorem 1.2 in Section 2 and Theorem 1.3 in Section 3, and then 
conclude in Section 4 with remarks on the relationship between Theorem 1.3 
and recent results about Cramer's characterization of the normal law. 



2. A "DIFFERENCE MEASURE" INEQUALITY FOR CONVEX MEASURES 

Given two convex bodies A and B in M", introduce A — B = {x — y:x£ 
A, y B}. In particular, ^4 — ^ is called the "difference body" of A. Note 
it is always symmetric about the origin. 

The Rogers-Shephard inequality [RS] states that, for any convex body 
A C M'^, 

\A-A\< C^JAl (2.1) 

where = ^1(^1^^! denote usual combinatorial coefficients. Observe that 
putting the Brunn-Minkowski inequality and (2.1) together immediately 
yields that 

lA — Al n 1 

2< ' , <ra^<4, 
\A\~ 

which constrains severely the volume radius of the difference body of A 
relative to that of A itself. In analogy to the Rogers-Shephard inequality, 
we ask the following question for entropy of convex measures. 

Question. Let X and Y be independent random vectors in M", which are 
identically distributed with density V~^, with V positive convex, and /3 > 
n + J. For what range o/ 7 > is it true that H[X — Y)< C-yH{X), for 
some constant C-y depending only on 7? 

Theorems 1.2 and 1.3 partially answer this question. To prove the former, 
we need the following lemma about convex measures, proved in [BM2]. 
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Lemma 2.1. Fix /3o > 1. Assume a random vector X in M" has a density 
f = , where V is a positive convex function on the supporting set. If 
/3 > n + 1 and (3 > (^qu, then 

log||/||-i< h{X)< c^„n + log ||/||-\ (2.2) 
where one can take for the constant C/Jq = ^g^zy • 

In other words, for sufficiently convex probability measures, the entropy 
may be related to the L°°-norm ||/||oo = sup^ /(x) of the density / (which 
is necessarily finite). Observe that the left inequality in (2.2) is general: 
It trivially holds without any convexity assumption. On the other hand, 
the right inequality is an asymptotic version of a result from [BM2] about 
extremal role of the multidimensional Pareto distributions. 

Now, let / denote the density of the random variable W = X — Y u\ 
Theorem 1.2. It is symmetric (even) and thus maximized at zero, by the 
convexity hypothesis. Hence, by Lemma 2.1, 

h{W) < log 11/11^1 + c^,n = log/(0)-i + cp,n. 

But, if p is the density of X, then /(O) = f^„p{x)'^ dx, and hence 

log /(0)~^ = — log / p{x) ■ p{x) dx < / p{x)[— log p{x)] dx 

by using Jensen's inequality. Combining the above two displays immediately 
yields the first part of Theorem 1.2. 

To obtain the second part, we need an observation from [MK] that follows 
from the following lemma on the submodularity of the entropy of sums 
proved in [Mad]. 

Lemma 2.2. Given independent random vectors X,Y,Z in with abso- 
lutely continuous distributions, we have 

h{X + Y + Z) + h{Z) < h{X + Z) + h{Y + Z), 

provided that all entropies are well-defined and finite. 

Taking X, Y and —Z to be identically distributed, and using the mono- 
tonicity of entropy (after adding an independent summand), we obtain 

h{X + Y) + h{Z) < h{X + Y + Z) + h{Z) < h{X + Z) + h{Y + Z) 

and hence 

h{X + Y) + h{X) < 2h{X - Y). 

(This is the relevant observation from [MK].) Combining this bound with 
the first part of Theorem 1.2 immediately gives the second part. 

It would be more natural to state Theorem 1.2 under a shape condition 
on the distribution of X rather than on that X — Y , but for this we need 
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to have better understanding of the convexity parameter of the convolution 
of two K-concave measures when k < 0. 

Observe that in the log-concave case of Theorem 1.2 (which is the case of 
P — )• oo, but can easily be directly derived in the same way without taking 
a limit), one can impose only a condition on the distribution of X (rather 
than that oi X — Y) since closedness under convolution is guaranteed by the 
Prekopa-Leindler inequality. 

Corollary 2.3. Let X and Y be independent random vectors in with 
log- concave densities. Then 

h{X-Y) < h{X)+n, 
h{X + Y) < h{X) + 2n. 

In particular, observe that putting the entropy power inequality (1.1) and 
Corollary 2.3 together immediately yields that 

- H{X) - 

which constrains severely the entropy power of the "difference measure" of 
H relative to that of /i itself. 

3. Proof of Theorem 1.3 

Given a (large) parameter 6 > 1, let a random variable X^ have a trun- 
cated Pareto distribution /i, namely, with the density 

= l{i<x<b}(^)- 
X log ' 

By the construction, ^ is supported on a bounded interval (1,6) and is 
convex. 

First we are going to test the inequality 

H{Xb + Yi,)<CH{Xb) (3.1) 
for growing 6, where Yi, is an independent copy of Xi^. Note that 



h{Xb) = f{x) log{x log b)dx 



1 log X 1 

= log log b + - — - / dx = log log 6 + - log b, 

log b Ji X 2 

so H{Xb) = b log2 b. 

Now, let us compute the convolution of / with itself. The sum Xf, + Yf, 
takes values in the interval (2, 2b). Given 2 < x < 26, we have 

g[x) = {f * f){x) = f{x-y)f{y)dy 



log^bJa {x-y)y' 



8 



SERGEY G. BOBKOV AND MOKSHAY M. MADIMAN 



where the hmits of mtegration are determined to satisfy the constraints 
l<y<b, l<x — y<b. So, 

a = max{l, X — b), /? = min(6, x — 1), 

and using -, — = - (- H — —), we find that 

^ (x—y)y X x—y'^ 



xlog 6 



log- 



y 



X log^ b x-y 



1 



log 



log- 



a 



X log b \ X — (3 X — a 



Note that x — a = x — max(l, x — b) = min(6, x — 1) = /3. Hence, 

2 /3 2 min(b,x-l) 

g[x) = log — = ^ log J- — -■ 

xlog 6 a xlog 6 max(l,x — oj 

Equivalently, 

2 

g{x) = 2~ log(x — 1), for 2 < x < b + 1, 

X log b 



di^) = ~; — 2~ — T' 6 + 1 < X < 26. 



2 6 
X log b X - b 

Now, on the second interval 6 + 1 < x < 26, we have 



2 2 2 

xlog"'6 xlog6 (6 + 1) log 6 



where the last bound holds for 6 > e, for example. Similarly, on the first 
interval 2 < x < 6 + 1, using log(x — 1) < log 6, we get 

2 1 
9ix) < ^—r < ^ < 1. 
X log 6 log 6 

Thus, as soon as 6 > e, we have 5 < 1 on the support interval. From this, 

r2h rb 

h{Xb + Yb)= g{x)\og{l/g{x))dx> g{x)\og{l/ g{x)) dx. 

Next, using on the first interval the bound g{x) < -^^^ < ^, valid for 
6 > e^, we get for such values of 6 that 

uir ^v^^ [\ ^^ a ^ f'' log(x-l)logx 

h{X}, + lb) > / g{x) log xdx = - — ^ / dx. 

h log b J2 X 
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To further simplify, we may write x — 1 > ^, which gives 

Mog(x-l)logx^^ > f'^dx-loglf'^dx 
X J2 ^ J2 X 

= l(logH-log3 2)-i^ (log2 6-log2 2) 
> ^ logH - logH. 



Hence, h{Xi, + F^) > | log b — log 2, and so 



In particular. 



H{Xh + Y,)>h^/^ (6>e2). 



> 7— — - — )• +00, as ^ +00. 



H{Xh) 2 log 6 

Hence, the inequality (3.1) may not hold for large b with any prescribed 
value of C. 

To test the second bound 

H{Xt-Yb)<CH{X,), (3.2) 

one may use the previous construction. The random variable Xi, — Yi, can 
take any value in the interval |x| < b — 1, where it is described by the density 

f+oo 1 1-13 J 

h{x)= f{x + y)f{y)dy ' 



log^feJa {x + y)y 

Here the limits of integration are determined to satisfy 1 < y < b and 
1 < x + y < b. So, assuming for simplicity that < x < 6 — 1, the limits are 

a = 1, /3 = b — X. 

Writing . ^ ^ = 1 (i - we find that 

uf \ 1 ^^ f \ ^ t , \\P 1 1 {b-x){x + 1) 

^^^) = , 2, log y -log x + y ^ = „ log . 

xlog^ft xlog 6 

It should also be clear that 

MO) ' 



log b J I y^ \og b 

Using log < log(x + 1) < X, we obtain that h{x) < j^^^ < 1) 

for b> e^. 
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In this range, since ^ 5^ ^Iso have that h{x) < ^r^^^ < ^■ 

Hence, in view of the symmetry of the distribution of Xf, — Yh, 

h{Xi,-Yh) = 2 h{x)log{l/h{x))dx 
Jo 

rb/2 



ro/z 

> 2 / h{x)logxdx 
Jo 



log^b 

But for < X < 6/2, 



logx , (b-x)(x + l) , 
log ax. 



, (b-x)(x + l) , x + 1 , 

log > log > log X - log 2, 



so 



2 /^/^ log2x-log21ogx 
log 6 J2 

ln„„3,,/o^ l_3n^ log 2 



log^6 V3 



(log^(V2) - log^ 2) - ^ (log' (6/2) - log' 2) 



2 

~ — log 6. 

3 ^ 

Therefore, like on the previous step, H{Xb — Yf,) is bounded from below by 
a function, which is equivalent to 6^/'^. Thus, for large 6, the inequality (3.2) 
may not hold either. 
Theorem 1.3 is proved. 

4. Remarks 

For a random variable X having a density, consider the entropic distance 
from the distribution of X to normality 

D{X) = h{Z) - h{X), 

where Z is a normal random variable with parameters EZ = EX, Var(Z) = 
Var(X). This functional is well-defined for the class of all probability distri- 
butions on the line with finite second moment, and in general < D{X) < 
+00. 

The entropy power inequality implies that 

2 2 
D{X + Y) < o D{X)+ o D{X) 

< nvAx{D{X),D{Y)), (4.1) 
where af = Var(X), (t| = Var(y). 
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In turn, if X and Y are identically distributed, then Theorem 1.3 reads 
as follows: For any positive constant c, there exists a convex probability 
measure /x on M with X, Y independently distributed according to ^, with 

D{X±Y) < D{X) -c. 

This may be viewed as a strengthened variant of (4.1). That is, in Theo- 
rem 1.3 we needed to show that both D[X + Y) and D{X — Y) may be 
much smaller than D[X) in the additive sense. In particular, D{X) has 
to be very large when c is large. For example, in our construction of the 
previous section 

log 2 log 6 

which yields 

Z)(Xfc) - ^ log 6, Z)(Xb + n) ~ ^ log 6, 

as 5 — > +03. 

In [BCGl, BCG2] a slightly different question, raised by M. Kac and 
H. P. McKean [McK] (with the desire to quantify in terms of entropy the 
Cramer characterization of the normal law), has been answered. Namely, 
it was shown that D{X + Y) may be as small as we wish, while D{X) is 
separated from zero. In the examples of [BCG2], D[X) is of order 1, while 
for Theorem 1.3 it was necessary to use large values for D[X), arbitrarily 
close to infinity. In addition, the distributions in [BCGl, BCG2] are not 
convex. 
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